Using unsupervised clustering approaches to identify common mental health profiles and associated mental health-care service-use patterns in Ontario, Canada

Abstract Mental health is a complex, multidimensional concept that goes beyond clinical diagnoses, including psychological distress, life stress, and well-being. In this study, we aimed to use unsupervised clustering approaches to identify multidimensional mental health profiles that exist in the population, and their associated service-use patterns. The data source was the 2012 Canadian Community Health Survey–Mental Health, linked to administrative health-care data; all Ontario, Canada, adult respondents were included. We used a partitioning around medoids clustering algorithm with Gower’s proximity to identify groups with distinct combinations of mental health indicators and described them according to their sociodemographic and service-use characteristics. We identified 4 groups with distinct mental health profiles, including 1 group that met the clinical threshold for a depressive diagnosis, with the remaining 3 groups expressing differences in positive mental health, life stress, and self-rated mental health. The 4 groups had different age, employment, and income profiles and exhibited differential access to mental health-care services. This study represents the first step in identifying complex profiles of mental health at the population level in Ontario. Further research is required to better understand the potential causes and consequences of belonging to each of the mental health profiles identified. This article is part of a Special Collection on Mental Health.

In Ontario, Canada, mental health care provided in general and specialist hospitals, as well as by physicians, including psychiatrists, is covered for all residents through the Ontario Health Insurance Plan (OHIP).Other types of mental health care, including psychological services and community-based care, are paid for privately, paid for out of pocket, or funded publicly outside of OHIP.In 2012, 1.6 million adults accessed publicly funded mental health outpatient care in Ontario, 130 000 accessed emergency care, and 50 000 accessed inpatient care. 1 In order to understand who may access and benefit from publicly funded mental health care, clinical experts have created mental health diagnoses representing combinations of symptoms with defined thresholds of severity, duration, and impact on functioning. 2Using these diagnostic definitions, approximately 7 million Canadians and just under 1 in 10 Ontarians live with a mental or substance-use disorder. 3,4owever, mental health is a complex, multidimensional concept that is greater than simply the absence of symptoms of mental illness. 5,6][18][19][20] Positive mental health is also an important dimension of mental health that is orthogonal to mental illness; it represents the social, emotional, and psychological well-being of an individual. 5,6,21A person with a diagnosable mental disorder can experience the presence or absence of well-being, and some studies have shown that persons with a mental illness who are deemed to be "f lourishing" in their positive mental health have better functioning and resilience, as well as better mental health outcomes, including less suicidal behavior. 22,23ecause the manifestation of mental health within individuals is complex, it is not best captured using single diagnostic indicators.In addition, different combinations of dimensions of mental health or "mental health profiles" could be linked to different patterns of use of mental health-care services.Understanding mental health profiles among the population and how they are linked to mental health service-use patterns could therefore be highly useful to informing mental health-care planning.
Population segmentation is an approach that enables the grouping of individuals within a population who have similar characteristics, such as health-care patterns or health symptoms. 24While initially segmentation was performed on the basis of clinical experience, data-driven methods, such as novel unsupervised clustering methods, can parse through complex data to find patterns and relationships that are not limited to a priori groups defined by clinicians. 25,260][31][32] No population segmentation tools for mental health exist yet in Ontario.In addition, most existing tools in other contexts focus on clinical populations, missing those who may have an unmet need for care.4][35][36][37] Further, segmentation tools often incorporate clinical needs and service use into a single tool.9][40] Although the factors contributing to service-use needs extend beyond clinically defined mental health to social determinants of health, services are funded and mandated to treat mental health problems.Through segmenting mental health profiles separately from care patterns, we can identify how service use tends to unfold for groups with similar profiles.
Establishing mental health profiles within the Ontario population along with their associated service-use patterns can inform the development of care pathways and programs designed to meet the service needs of these groups.These segments can also be used to inform policies and strategies that are targeted towards specific mental health profiles.
Therefore, the objective of this research was to use unsupervised clustering approaches to identify and characterize groups with unique mental health profiles in the Ontario population, and to identify the mental health service-use patterns associated with these profiles.

Data source and population
The primary data source for this study was the 2012 Canadian Community Health Survey, Mental Health edition (CCHS-MH), a cross-sectional survey disseminated to a representative sample of Canadians aged ≥15 years living in private dwellings. 41To characterize mental health-care service use in the eligible cohort, the CCHS-MH data were linked with administrative data holdings available through Data & Analytic Services at the Institute for Clinical Evaluative Sciences (ICES), including the Registered Persons Database, the Ontario Mental Health Reporting System, the Canadian Institute for Health Information's (CIHI) Discharge Abstract Database, the National Ambulatory Care Reporting System, and OHIP.
All OHIP-eligible and enrolled adults aged ≥16 years who completed the 2012 CCHS-MH survey were included.Those under age 16 were excluded, since they were likely to access specialized child and adolescent mental health services funded outside of OHIP.Given that the age range of the CCHS-MH participants was 15 years or older, this excluded only a small number of individuals.Those who were ineligible for OHIP within 12 months of their CCHS-MH interview date in 2012 were excluded using the "OHIPELIG" macro developed by ICES.

Input variables
The variables included in the clustering algorithm are described in Table 1.All measures were captured in a single interview conducted between January and December 2012.Input variables were selected from the CCHS-MH measures to capture different dimensions or aspects of mental health that have previously been linked to likelihood of accessing services.This included clinically diagnosable mental health conditions, indicators of stress, and negative mental health symptoms that may not meet the criteria for a mental health disorder (psychological distress, life stress), indicators of well-being (life satisfaction, self-rated mental health), and a measure of positive mental health (the Mental Health Continuum Short Form). 6,18Where available, validated measures were selected; in their absence, single-item measures were selected.To ensure statistically that the different measures were capturing different constructs, we examined the correlation between measures to ensure lack of complete overlap.

Other CCHS-MH covariates
Sociodemographic variables such as age, ethnicity, educational attainment, immigrant status, and income are self-reported within the CCHS-MH.Geographic region (urban/rural) was measured using Statistics Canada's definition based on population density and census metropolitan area. 43

Mental health service use
Episodes of mental health or addiction-related care received by the eligible participants between January 1 and December 31, 2012, were identified using the administrative data available at ICES.All episodes of care within the Ontario Mental Health Reporting System were captured, while primary diagnostic codes were used to identify mental health or addiction-related inpatient visits within the CIHI Discharge Abstract Database and emergency care visits in the National Ambulatory Care Reporting System (diagnostic codes available in Tables S1-S3), and an existing algorithm was used to identify outpatient visits in OHIP data, with 80.7% specificity and 97.0% sensitivity. 44Care received was subsequently summarized for each individual as the total count of (1) family physician visits, (2) psychiatrist visits, (3) emergency care visits, (4) hospitalizations, and (5) other specialist visits.

Analyses
In order to identify clusters with similar mental health profiles, a partitioning around medoids (PAM) algorithm was used. 45An extension of the k-means algorithm, the PAM procedure is a centroid-based clustering algorithm that uses an iterative process to identify clusters; each individual is first randomly assigned to a cluster, the cluster center is then calculated, and the proximity between each individual and each cluster center is measured, with individuals being assigned to the nearest cluster.This process is repeated until no changes in cluster membership occur.
As opposed to the k-means algorithm, which uses the mean as the cluster center, the PAM algorithm uses medoids to define the cluster center, which is robust to extreme outliers. 45In the context of this study, since there may have been a small group of individuals with very poor mental health in the sample, this approach ensured that they did not have undue inf luence on the groupings identified, allowing us to identify variation between the majority of individuals.The Gower's proximity function was used to define proximity between each individual and the cluster center, because it adapts to different types of data, including continuous, ordinal, and nominal/binary data, to produce an overall measure of distance between 0 (identical) and 1 (maximally dissimilar). 46This clustering approach has been used previously to find groups using health-related administrative and survey data. 47,48ince the PAM algorithm requires a prespecified number of clusters, it was repeated with 2-8 clusters, and internal cluster validity was assessed using the silhouette coefficient, a measure that captures the average similarity between individuals and their own cluster as compared with other clusters. 49In addition to quantitative assessment of cluster performance, the utility and meaningfulness of the clustering solution was assessed, and both were weighed in selecting a final solution.Once the final clustering solution was identified, the clusters were characterized according to the input variables and assigned a summary label indicating the mental health profile they were deemed to represent.The final clustering solution was characterized using the sociodemographic variables available in the CCHS-MH.
Among individuals with any mental health or addiction-related care episode identified within the ICES administrative data, a similar clustering approach was used to identify service-use patterns.Specifically, a PAM algorithm was used to identify clusters with similar mental health service-use patterns.However, for services, a Bray-Curtis distance was used to accommodate the count data (numbers of each type of service use). 45,50The solution that represented the best balance between cluster validity (measured using the silhouette coefficient) and utility was selected, using a similar approach as that used to identify mental health profile clusters.These clusters were used to characterize mental health service-use patterns among the different mental health profiles identified within the CCHS-MH data.An alluvial plot was produced to visualize the relationship between mental health profiles identified within the CCHS-MH and service-use patterns identified within the ICES administrative data.
Since the PAM algorithm is subject to random variation depending on the initial seed used to select cluster centers, both clustering algorithms were repeated with different seeds, as well as with trimmed values of extreme outliers (each count was trimmed at the 99.9th percentile).An adjusted Rand index was used to compare solutions, a measure of agreement across pairs of individuals adjusted for agreement by chance. 33,34A high degree of similarity across solutions (most individuals are in the same cluster in both solutions) indicates that the cluster solution is robust against analytical decisions.

Results
In total, 4159 Ontario residents were identified among the 2012 CCHS-MH survey participants, of whom 73 (1.8%) were excluded due to being under 16 years of age at the time of interview, and 85 (2.0%) were excluded due to lacking OHIP eligibility within 12 months of their CCHS interview date.An additional 191 (4.6%) individuals were missing data on 1 or more input variables and were therefore excluded, leaving a final analytical sample of 3810.Sociodemographic and mental health characteristics of the analytical sample are shown in Table 2. Overall, the average age of the sample was 49 years, with a relatively even spread across age groups; 53% were female, and 83% resided in urban areas.Educational levels were high, with over two-thirds of participants having a postsecondary education, and more than half were employed in some capacity.
Generally mental health was positive in this sample, with the majority (over 90%) describing their mental health as good, very good, or excellent, and over three-quarters were "f lourishing" according to the Mental Health Continuum Short Form positive mental health scale.Clinically diagnosable mental health disorders were rare, the most common being a mood disorder, which occurred among 6% of the sample.Most of the sample expressed at least some life stress, and around 1 in 5 participants in the sample felt they were under quite a bit of stress or extreme stress.Figure 1 displays a silhouette plot showing the silhouette coefficients for the PAM algorithm by k-cluster.The 2-cluster solution produced the highest internal validity, and there was a second peak at 4 clusters, followed by a third peak at 8 clusters.After examining the clustering solutions according to their input variables, we deemed the 2-cluster solution to be of limited utility, primarily representing the presence or absence of positive mental health.The 8-cluster solution produced a finer-grained solution that began to identify all possible combinations of the input variables, but it was lacking in parsimony/utility due to the size of the solution.Therefore, the 4-cluster solution was deemed to represent the best balance between internal validity and usefulness, given its simplicity and ability to capture distinguishable groups.A visualization of the input variables across the 4 clusters is shown in Figure 2, and the 2-cluster and 8-cluster solutions are depicted in Figures S1 and S2, respectively.Sociodemographic information and input variables are displayed across the clusters in Table 3.

Clusters
Cluster 1 (n = 1321, 34.7%)-f lourishing, minimal/no life stress The first cluster, capturing just over one-third of the sample, was a group of participants who were mostly f lourishing and absent  approximately 1 in 5 resided in rural areas, the highest among the clusters.Over half were not currently employed and were possibly retired, given the older age profile of this group.
Cluster 2 (n = 1615, 42.4%)-f lourishing, some life stress The second cluster, which included the largest percentage of respondents of all the clusters, represented those who were f lourishing in their mental health and largely absent of clinical mental diagnoses.Average self-rated mental health was slightly lower in this group than in cluster 1, although the majority still reported their mental health to be very good or excellent.This group differed from cluster 1 in terms of their life stress, with all reporting at least "a bit" of life stress and one-quarter rating their level of life stress as "quite a bit" or "extreme."Cluster 2 had a high proportion of married individuals (59%), and this group had the highest proportion of employed individuals of all the clusters (68%).This group had a higher proportion of persons aged 35-55 years, which may explain their higher employment rates.However, this group also had the highest socioeconomic profile, with almost threequarters reporting postsecondary school graduation, and over a quarter of this group reported an annual household income of CAD$100 000 or above.

Cluster 3 (n = 663, 17.4%)-moderate mental health and stress
Cluster 3 was a small cluster that represented a group with middling-to-poor mental health but largely without a clinically diagnosable mental illness.This group had an absence of positive mental health, with the majority receiving a rating of "moderate" and some "languishing."A small proportion met the criteria for an anxiety disorder (4.2%) or a substance-use disorder (8.4%); however, for the most part this group was absent of clinical mental health diagnoses.Self-rated mental health was lower in this group than in the first 2 clusters, with most reporting that their mental health was "good" or "fair."Most of this group reported at least some life stress, and they had higher average psychological distress scores compared with the first 2 clusters, though the scores were still relatively low.Cluster 3 was the only cluster that was majority male, with 53% identifying as such, and an even representation across age groups.This group also had a lower average household income, with 57% reporting earning CAD$60 000/y or less, despite the fact that this group had a higher proportion of individuals working full-time than cluster 1.

Cluster 4 (n = 211, 5.5%)-clinical mood disorder
Cluster 4 represented a small group of individuals who met the criteria for a clinically diagnosable mood disorder.Comorbid clinical conditions were common in this group, with over onequarter meeting the criteria for an anxiety disorder and 1 in 6 meeting the criteria for a substance-use disorder.Almost a third of this group also reported suicidal thoughts.Self-rated mental health was lowest in this group, with most participants reporting their mental health as "fair" or "poor."Almost all individuals in this group reported some life stress, and psychological distress scores were elevated in comparison with the other groups.While most participants in this group did not have positive mental health, around 1 in 4 were still "f lourishing."Cluster 4 had the youngest age profile of all the clusters, with an average age of 42 years and over a third under age 35 years.This group also had a higher proportion of female-identifying individuals than the other groups and were more likely to reside in urban areas.This group included a slightly lower proportion of recent immigrants and those who were married.Almost half of this group reported a household income of less than CAD$40 000/y.

Mental health service-use patterns
In total, 367 (9.6%) of the 3810 eligible CCHS-MH respondents accessed publicly funded mental health services in the calendar year 2012.When the PAM clustering procedure was conducted in this group, the k-cluster with the highest silhouette coefficient was an 8-cluster solution.However, a 3-cluster solution was deemed to identify a meaningful and parsimonious summary of the service-use information with minimal loss of internal validity as compared with higher k-cluster solutions and was therefore selected (silhouette plot shown in Figure S3).The 3cluster solution is presented in Figure  psychiatrist visits throughout the year, sometimes accompanied by other general outpatient care.Note that while persons who are hospitalized may represent an important group of mental health service users, hospitalizations were rare among CCHS-MH respondents, and therefore this variable did not have a substantial impact on the clustering solution and was excluded from the plots due to low numbers.
Mental health service-use patterns, as represented by the 3 clusters identified, are presented by mental health profile in Figure 4.While most respondents in all groups did not access any mental health services, this varied across the 4 mental health profile clusters, and among service users, there appeared to be a gradient in access to increasingly intensive use of care with increasing negative mental health or stress.Despite both groups' representing in large part those without a clinically diagnosable mental health disorder, those with moderate mental health and stress (cluster 3) were 3 times more likely to access repeated psychiatrist care, twice as likely to access emergency care, and 1.5 times as likely to access light outpatient care than those who were f lourishing with no life stress (cluster 1).Participants with a clinical mood disorder had an elevated likelihood of accessing all types of care in comparison with the other 3 groups; however, almost two-thirds of this group did not access any type of mental health care, despite having elevated depressive symptoms that met the clinical threshold.These differences were tempered by small numbers.

Discussion
This study demonstrates how an unsupervised clustering approach can identify more nuanced and complex groupings than single indicators alone, such as clinical diagnostic tools.We identified 4 mental health profiles among a representative sample of the general population in Ontario.One group was primarily defined by exhibiting clinically diagnosable depressive symptoms, while 3 of the 4 groups were mostly absent diagnosable mental health conditions but showed notable differences on mental health and well-being indicators that were linked to differential likelihood of service use.These mental health profiles were also found to have sociodemographic differences, including different age, employment, and income profiles.
Existing work that has been undertaken to examine mental health service use has typically focused on populations with a clinically diagnosable mental health disorder [28][29][30][31] or the most intense and costly use. 52This study complemented and expanded on this work by focusing on a general population to better understand mental health and its relationship with mental healthcare patterns.For example, we found that persons with moderate mental health but no clinically diagnosable disorder were more likely to access services than those who were f lourishing and without life stress.We also found that approximately 1 in 20 of those who reported very good mental health received outpatient care, typically through a family doctor, while conversely over half of those with a diagnosable disorder did not access any care.
The reasons for these findings are complex; however, they do imply that current diagnostic indicators are only part of the picture in understanding mental health service use.While clinicians may not deem some of the mental health profiles we identified to represent persons who could benefit from treatment, it is clear that such individuals make up a nontrivial number of those accessing services.This may ref lect a disconnect between clinician-assessed need (diagnostic indicators) and self-assessed need (indicated by access to services). 33,344][55] In future work, researchers might attempt to understand self-assessed service-use needs among persons with the same mental health profile in order to determine the extent to which these needs are being met.Beyond this, future work may seek to extrapolate and refine these profiles at the population level, by connecting these data with other information, such as long-term health care and economic data.Future work may also include qualitative interviews and consultations with clinical experts to identify relevant measurable outcomes specific to the population mental health profiles we identified, and to inform strategies and programs designed to improve these outcomes over time.
This study had some limitations.While the CCHS-MH identifies a representative sample of the general population, it is unlikely to capture individuals with severe or intense mental health problems or clinical needs.In addition, we identified publicly funded mental health services; however, due to lack of data, we could not capture or examine mental health services that are provided privately or through community-based programs.Therefore, it is not clear how these groups are accessing care outside of the public system.Finally, these were cross-sectional data; therefore, we could not assess whether participants accessing services had fewer symptoms due to receiving effective care.This may be more true for some patterns of service use (eg, repeated psychiatrist care) than for others (eg, emergency care or sparse general outpatient care).Future work with longitudinal data sets could also seek to expand our findings by examining the stability of profile membership over time.
In summary, this study represents the first step in modeling mental health as a multidimensional concept for identifying com-plex mental health profiles at the population level in Ontario.Further work is required to dig deeper into these profiles to inform strategies designed to improve outcomes within these groups.data adapted from the Ontario Ministry of Health Postal Code Conversion File, which contains data copied under license from Canada Post Corporation and Statistics Canada.Parts of this material are based on data and/or information compiled and provided by CIHI and the Ontario Ministry of Health.The analyses, conclusions, opinions, and statements expressed herein are solely those of the authors and do not ref lect those of the funding or data sources; no endorsement is intended or should be inferred.This document also uses data adapted from Statistics Canada (Canadian Community Health Survey-Mental Health, 2012).This does not constitute an endorsement of this product by Statistics Canada.

Figure 1 .Figure 2 .
Figure 1.Silhouette plot displaying internal cluster validity by k-cluster for a mental health clustering solution using data from the Ontario component of the Canadian Community Health Survey-Mental Health, 2012.

Figure 3 .
Figure 3. Visualization of a 3-cluster solution identifying mental health service-use patterns among respondents according to input variables, Canadian Community Health Survey-Mental Health, 2012.

Table 1 .
Variables in the 2012 Canadian Community Health Survey-Mental Health selected as clustering input variables.
Abbreviations: DSM-IV, Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition; WMH-CIDI, World Mental Health Composite International Diagnostic Interview.

Table 2 .
Sociodemographic and mental health characteristics of a cohort of Ontario, Canada, adults (n = 3810), Canadian Community Health Survey-Mental Health, 2012.

Table 2 .
Continued a Data presented are numbers and percentages unless otherwise indicated.b 0 = lowest, 24 = highest.c Mental Health Continuum Short Form positive mental health scale.

Table 3 .
Distribution (no.(%) a ) of participants into clusters with distinct mental health profiles as determined by sociodemographic and mental health characteristics (n = 3810), Canadian Community Health Survey-Mental Health, 2012.
(Table continues)of clinical mental health diagnoses, with very low psychological distress scores.Most of this group reported their mental health as very good or excellent, and almost all people in this group reported that their lives were not very stressful or not at all stressful.Compared with the other clusters, individuals in this group were slightly older, with over half aged 55 years or more, and

Table 3 .
Continued Data presented are numbers and percentages unless otherwise indicated.b A range of numbers (and percentages) is given when the true number was 5 or less, to reduce reidentification risk for individuals (plus 1 additional row within measures to avoid recalculation).
a c 0 = lowest, 24 = highest.d Mental Health Continuum Short Form positive mental health scale.
Figure 4. Alluvial plot andtable demonstrating mental health service-use patterns across different mental health profiles of respondents, Canadian Community Health Survey-Mental Health, 2012.MHA, mental health and addictions.